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Abstract —The girth of a matrix is the least number of linearly 
dependent columns, in contrast to the rank which is the largest 
number of linearly independent columns. This paper considers 
the construction of high-girth matrices, whose probabilistic girth 
is close to its rank. Random matrices can be used to show the 
existence of high-girth matrices with constant relative rank, but 
the construction is non-explicit. This paper uses a polar-like 
construction to obtain a deterministic and efficient construction 
of high-girth matrices for arbitrary fields and relative ranks. 
Applications to coding and sparse recovery are discussed. 

I. Introduction 

Let A be a matrix over a field F. Assume that A is fiat, 
i.e., it has more columns than rows. The rank of A, denoted 
by rank(A), is the maximal number of linearly independent 
columns. The girth of A, denoted by girth(A), is the least 
number of linearly dependent columns. What are the possible 
tradeoffs between rank(A) and girth(A)? This depends on the 
cardinality of the field. It is clear that 

girth(A) < rank(A) + 1. (1) 

Is it possible to have a perfect-girth matrix that achieves this 
upper-bound? If F = R, drawing the matrix with i.i.d. standard 
Gaussian entries gives such an example with probability 1. 
However, if F = F^, where q is finite, the problem is different. 
Eor F = Fq, note that 

girth(A) = dist(CA), (2) 

where dist(C'yi) is the distance of the q-stry linear code Ca 
whose parity check matrix is A. In fact, the least number of 
columns that are linearly dependent in A is equal to the least 
number of columns whose linear combination can be made 0, 
which is equal to the least weight of a vector that is mapped 
to 0 by A, which is the least weight of a codeword, i.e., the 
code distance since the code is linear. 

Hence, over finite fields, the girth is a key parameter for 
error-correcting codes, and studying the girth/rank tradeoffs for 
matrices is equivalent to studying the distance/dimension trade¬ 
offs for linear codes. Clearly it is not possible to obtain perfect- 
girth matrices over F = F 2 , even if we relax the perfect-girth 
requirement to be asymptotic, requiring rank(A) ^ girth(A) 
when the number of columns in A tends to infinity.' If F = F 2 , 
the Gilbert-Varshamov bound provides a lower-bound on the 
maximal girth (conjectured to be tight by some). Namely, 

*We use the notation a„ ~ for limn—»oo o,n/bn = 1. 
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for a uniformly drawn matrix A with n columns, with high 
probability, 

rank(A) = niJ(girth(A)/n) -f o(n), (3) 

where H is the binary entropy function. 

Eor F = Fq, the bound in (1) is a restatement of the 
Singleton bound for linear codes and expressed in terms of the 
co-dimension of the code. Asking for a perfect-girth matrix 
is hence equivalent to asking for an MDS linear code. Such 
constructions are known when q = n with Reed-Solomon codes. 
Note that the interest on MDS codes has recently resurged 
with the applications in distributed data storage, see [5] for a 
survey. 

One may consider instead the case of non-finite fields, 
typically not covered in coding theory. As shown in Section 
IV-B, this is relevant for the recovery of sparse signals [6] 
via compressed measurements. The girth is then sometimes 
called differently, such as the Kruskal-rank or spark [6]. As 
stated above, for F = R, a random Gaussian matrix is perfect- 
girth with probability one. However, computing the girth of an 
arbitrary matrix is NP-hard [10] (like computing the distance 
of a code [11]), making the latter construction non-explicit. 

In this paper, we are mainly interested in the following 
notion of probabilistic girth, defined to be the least number of 
columns that are linearly dependent with high probability, when 
drawing the columns uniformly at random. Eormal definitions 
are given in the next section. Going from a worst-case to a 
probabilistic model naturally allows for much better bounds. 
In particular, defining high-girth matrices as matrices whose 
probabilistic girth and rank are of the same order (up to o{n)), 
a random uniform matrix proves the existence of high-girth 
matrices even for F = F 2 . However, obtaining an explicit 
construction is again non-trivial. 

In this paper, we obtain explicit and fully deterministic 
constructions of high-girth matrices over any fields and for any 
relative ranks. We rely on a polar-code-like construction. Start¬ 
ing with the same squared matrix as for polar or Reed-Muller 
codes, i.e., the tensor-product/Sierpinski matrix, we then select 
rows with a different measure based on ranks. Eor finite fields, 
we show that high-girth matrices are equivalent to capacity- 
achieving linear codes for erasure channels, while for errors 
the speed of convergence of the probabilistic girth requirement 
matters. In particular, we achieve the Bhattacharyya bound 
with our explicit construction. Eor the real field, this allows to 


construct explicit binary measurement matrices with optimal 
probabilistic girth. 

These results have various other implications. First, our 
construction gives an operational interpretation to the upper- 
bound of the Bhattacharyya-process in polar codes. When 
the channel is not the BEC, the upper-bound of this process 
used in the polar code literature is in fact the conditional rank 
process studied in this paper. Second, this paper gives a high- 
school level proof (not necessarily trivial but relying only basic 
linear algebra concepts) of a fully deterministic, efficient, and 
capacity-achieving code for erasure channels. While capacity- 
achieving codes for the BEC are well-known by now, most 
constmctions rely still on rather sophisticated tools (expander 
codes, polar codes, LDPC codes, spatially-coupled codes), and 
we felt that an explicit construction relying only on the notion 
of rank and girth is rather interesting. On the other hand, for 
F = F 2 , our construction turns out to be equivalent to the 
polar code for the BEC, so that the difference is mainly about 
the approach. It allows however to simplify the concepts, not 
requiring even the notion of mutual information. Einally, we 
expect the result to generalize to non-binary alphabets, given 
that our construction does depend on the underlying held. 


For fj, e [0,1], we say that An is /r-high-girth if it is high-girth 
and girth* (A„) = 

Example 1. Consider the following construction, correspond¬ 
ing to Reed-Solomon codes. Let xi,..., a;„ be distinct elements 
of a held F, and consider the m x n matrix 
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Then V will satisfy a stronger property than being high-girth, 
as its actual girth is m -I- 1: every m x m submatrix will be 
invertible, since every m x m submatrix is a Vandermonde 
matrix whose determinant must be nonzero. Ftowever, this 
example cannot be used to constmct high-girth families over a 
hxed hnite held F. For as soon as n exceeds the size of F, it 
will be impossible to pick distinct Xi’s, and we will no longer 
have high girth. 

Example 2. K jinx n uniform random matrix with entries in 
F 2 is p-high-girth with high probability. 


II. High-Girth Matrices 

A. Notation 

Let A be a m X n matrix over a held F. For any set S' C [n], 
let >1[S] be the submatrix of A obtained by selecting the 
columns of A indexed by S. For s G [0,1], let A[s] be a 
random submatrix of A obtained by sampling each column 
independently with probability s. Thus, A[s] = A[S], where S 
is an i.i.d. Ber(s) random subset of [n]. In expectation, A[s] 
has sn columns. Throughout the paper, an event Bn takes place 
with high probability if P{£’„} —1 when n —>■ 00 , where n 
should be clear from the context. 

B. Probabilistic Girth 

Definition 1. Let {An} be a sequence of matrices over a held 
F, where An has n columns. The probabilistic girth of An 
is the supremum of all s G [0,1] such that An[s] has linearly 
independent columns with high probability, i.e., 

girth* ({A„}) := (4) 

sup{s G [0,1] : P{A„[s] has lin. indep. cols.} = 1 — o(l)} 

Note that a better name would have been the probabilistic 
relative girth, since it is a counterpart of the usual notion of girth 
in the probabilistic setting with in addition a normalization fac¬ 
tor by n. We often write girth*(A„) instead of girth*({^„}). 
We will sometimes care about how fast the above probability 
tends to 1. We then say that An has a probabilistic girth with 
rate T{n) if the above definition holds when 

P{A„[s] has lin. indep. columns} = 1 — T(n). (5) 

Definition 2. We say that An is high-girth if 

girth* (xl„) = lim sup rank(yl„)/n. (6) 

n—)-oo 


III. Explicit construction of high-girth matrices 

A. Sierpinski matrices 

Let F be any field, let n be a power of 2, and let be the 
matrix over F defined by 

/, , \ ®logr!, 

1) • 

Note that the entries of this matrix are only O’s and I’s, hence 
this can be viewed as a matrix over any held. 

Many important codes can be derived from G„, and they are 
all based on a simple idea. Namely, we hrst pick some measure 
of “goodness” on the rows of G„. Then, we take the submatrix 
of G„ obtained by keeping only those rows which are the “best” 
under this metric, and we hnally dehne a code whose PCM is 
this matrix. The hrst important examples are Reed-Muller (RM) 
codes [8], [9], where goodness is measured by the weight of the 
rows, and more recently polar codes [2], [3], where goodness 
is measured by the entropy (or mutual information). In the next 
section, we dehne a measure of goodness based on ranks and 
use it to construct high-girth matrices. A similar construction 
was proposed in [7] for Hadamard matrices to polarize the 
Renyi information dimension. We discuss applications to coding 
and sparse recovery in the next sections. 

B. Conditional-rank matrices 

With s G [0,1] hxed, let Gn^ denote the submatrix of G„ 
obtained by taking the hrst i rows, and let Gn ^ [s] be the random 
submatrix obtained by sampling each column independently 
with probability s, as above. 

Dehnition 3. The conditional rank (COR) of row i in G„ is 
dehned by 

p{n, i, s) = E(rankF G£*^ [s]) — E(rankF [s]) (9) 




where rank^ denotes the rank computed over the field F. When 
i = 1, define 


p{n,i,s) = E(rankF G£^^[s]) (10) 

Now, by adding the ith row, we will either keep the rank 
constant or increase it by 1, and the latter will happen if and 
only if the ith row is independent of the previous rows. So we 
get that 

p{n, i, s) = P(the *th row of G„[s] is 

independent of the previous i — 1 rows), (11) 

where linear independence is also considered over F. The key 
property of the conditional ranks is expressed in the following 
lemma. 

Lemma 1. Define the functions 


Theorem 1 (Application of [4]). For any n. 




= s + o(l) 

= (1 - s) + o(l) 


(15) 

(16) 


Hence the theorem tells us is that the above martingale 
polarizes very quickly: apart from a vanishing fraction, all 
p{n, i, s)’s are exponentially close to 0 or 1 as n —>■ oo. With 
this in mind, we define the following. 


Definition 4. Let n be a fixed power of 2, and let s G [0,1] be 
fixed. Let JI C [nl be the set of indices i for which p(n, i, s) > 
1 — 2“" , and let m = \H\. By Theorem 1, we know that 

m = sn + o(n). Let denote the m x n submatrix of Gn 
gotten by selecting all the columns of G„, but only taking 
those rows indexed by H. We call i?„ the COR matrix of size 
n with parameter s. 


£{x) = 2x — (12) 

r(x) = x^ (13) 

and define a branching process of depth logn and offspring 
2 (i.e., each node has exactly two descendants) as follows: 
the base node has value s, and for a node with value x, its 
left-hand child has value i{x) and its right-hand child has 
value r{x). Then the n leaf-nodes of this branching process 
are, in order, the values p(n, i, s) for 1 < i < n. 

An important point about this lemma is that the functions 
£ and r do not depend on F, while the p{n,i,s) values do, 
a priori. Thus, one way to interpret this lemma is that the 
expected conditional ranks of G„ do not depend on the field 
F, even though their definition does. The proof of Lemma 1 
is given in Section V. 

A key property of the branching process in Lemma 1 is that 
it is a balanced process, meaning that the average value of the 
two children of a node with value x is x again: 

£{x)-\-r{x) {2x - x'^)-\-x"^ 

2 --T-<“'> 


Note that the construction of COR matrices is trivial as 
opposed to the constmctlon of polar codes based on the entropy 
of mutual information for general sources or channels. 

We will Index the rows of hy i G H, rather than j G [m]. 
We sometimes denote by R. The most important property 
of Rn is expressed in the following theorem. 

Theorem 2. For any s G [0,1], i?n[s] has full rank (i.e. rank 
m) with high probability, as n ^ oo. In fact, i?n[s] has full 
rank with probability 1 — 0(2“" ). 

The proof is a simple consequence of Lemma 1 and Theorem 
1, and can be found In the Appendix. 

Theorem 2 implies the following. 

Theorem 3. For any s G [0,1], Rn is s-high-girth. 

Since the proof of Theorem 2 works independently of the 
base field F, the same is true of Theorem 3. Thus, the COR 
construction is a fully deterministic and works over any field. 
In fact, it requires only two values (0 and 1) for the matrix, 
even when F = R. 

IV. Applications of high-girth matrices 


This means that this branching process defines a martingale, 
by letting a random walk go left or right with probability half. 
Moreover, since p{n,i,s) is a probability, we have that this 
martingale stays in [0,1]. So by Doob’s martingale convergence 
theorem, we must have that this martingale converges almost 
surely to its fixed points. In fact, Doob’s theorem is not 
needed here, as one may conclude using only the fact that the 
increments are orthogonal.^ Its fixed points are those x’s for 
which £{x) = r(a;) = x. The only points satisfying this are 
0 and 1, so this martingale polarizes. In fact, much can be 
said about the speed of polarization of this process, as it is 
equivalent to the polarization process for BEC channels studied 
in [4]. 

^Private discussion with E. Telatar. See also [1] 


A. Coding for erasures 

Let F a field and p G [0,1]. The memoryless erasure channel 
on F with erasure probability p, denoted by MEC{p), erases 
each component of a codeword on F independently with 
probability p. Denoting by e the erasure symbol, the output 
alphabet is hence F* = F U {e} and the transition probability 
of receiving y G f* when a: G F is transmitted is 


W{y\x) 


p if y = e, 
1 — p if y = X. 


(17) 


The memory less extension is defined by IV"(?/"|a:”) = 
nr=i Wiy,\x,) for x" G F", j/" G F^ 

Recall that a code of block length n and dimension k over 
the alphabet F is a subset of F” of cardinality |F|^. The 
code is linear if the subset is a subspace of dimension k. In 






particular, a linear code can be expressed as the image of a 
generator matrix G S or as the null space of a parity- 

check matrix H G The rate of a code is defined by 

k/n. A rate R is achievable over the MEC{p) if the code can 
correct the erasures with high probability. More specifically, 
R is achievable if there exists a sequence of codes G„ of 
blocklength n and dimension kn having rate R, and decoders 
Dn ■ F" —>• F", such that Pe{Cn) —^ 0, where for x" drawn 
uniformly at random in G„ and y" the output of z" over the 
MEC{p), and 

Pe{Cr,):=P{D{y'-)^x'^}. (18) 

The dependency in Dn is not explicitly stated in Pe as there 
is no degree of freedom to decode over the MEC (besides 
guessing the erasure symbol), as shown in the proof of next 
lemma. 

The supremum of the achievable rates is the capacity, given 
by 1—p. We now relate capacity-achieving codes on the MEC 
and high-girth matrices. 

Lemma 2. A linear code Cn achieves a rate R on the MEC{p) 
if and only if its parity check matrix has probabilistic girth at 
least 1 — R. In particular, a code achieves capacity on on the 
MEC{p) if and only if its parity check matrix is p-high-girth. 

In particular, the linear code whose parity-check matrix is a 
COR matrix of parameter p achieves capacity on the MEC{p). 


fc-sparse from Ax, it must be that Ax Ax' from all x, x' 
that are fc-sparse (and different). Hence A needs^ to have girth 
2fc-f 1. 

One may instead consider a probabilistic model where a k- 
sparse signal has a random support, drawn uniformly at random 
or from an i.i.d. model where each component in [n] belongs 
to the support with probability p = k/n. The goal is then to 
construct a flat matrix A which allows to recover A:-sparse 
signals with high probability on the drawing of the support. 
Note that a bad support S is one which is A:-sparse and that 
can be paired with another fc-sparse support S' such that that 
there exists real-valued vectors x, x' supported respectively on 
S, S' which have the same image through A, i.e.. 

Ax = Ax' A{x — x) = 0. (19) 

Note now that this is equivalent to saying that the columns of 
A indexed by S' U S" are linearly dependent, since x — x' is 
supported on S U S' which is 2A:-sparse. 

Hence, the probability of error for sparse recovery is given 
by 

Ps{3S' : A[S U S'] has lin. dep. columns}. (20) 

This error probability can be upper-bounded as for errors (see 
next section and Section VII), by estimating the probability 
that A has a subset of up to 2k linearly dependent columns, 
which relies on the high-girth property of A. 


Remark 1. In the binary case, COR codes give a new 
interpretation to EEC polar codes: instead of computing the 
mutual information of the polarized channels via the generator 
matrix, we can interpret EEC polar codes from the girth of 
the parity-check matrix. Note that this simplifies the proof 
that EEC polar codes achieve capacity to a high-school linear 
algebra — mutual information need not be even introduced. 
The only part which may not be of a high-school level is 
the martingale argument, which is in fact not necessary, as 
already known in the polar code literature (see for example [ 1 , 
Homework 4], which basic algebra). 

Remark 2. As shown in [2], the action of the matrix Gn = 
(J on a vector can be computed in 0(nlogn) time 

as well, which means that the encoding of the COR code can 
be done in 0(n log n) time, as well as the code construction 
(which is not the case for general polar codes). Decoding 
the COR code can be done by inverting the submatrix of A 
corresponding to the indices that do not have erasure symbols, 
which can be done by Gaussian elimination in 0{n^) time. 
Alternatively, COR codes can be decoded as polar codes, i.e., 
successively, in 0(nlog(n)). Hence, like polar codes for the 
EEC, COR codes are deterministic, capacity-achieving, and 
efficiently encodable and decodable for the MEC. 

B. Sparse recovery 

In the setting of sparse recovery, one wishes to recover a 
real-valued sparse signal from a lower-dimensional projection 
[6]. In the worst-case model, a fc-sparse signal is a vector 
with at most k non-zero components, and to recover x that is 


C. Coding for errors 

In this section, we work over the binary held F 2 . The 
binary symmetric channel with error probability p, denoted by 
BSC{p), flips each bit independently with probability p. More 
formally, the transition probability of receiving y Gf 2 when 
X S F2 is transmitted is given by 


W{y\x) 


p 'Ay f^x 

1—p 'A y = x 


( 21 ) 


The memoryless extension is then defined by IE”(j/"|x") = 
{yi\xi), for x"',?/"' S F^. 

Theorem 4. Let p G [0,1/2] and s = s{p) = 2^/p(/r^^p) be 
the Bhattacharyya parameter of the BSC{p). Let {G„} be 
the COR code with parameter s (the code whose PCM is Rn). 
Then Cn can reliably communicate over the BSC{p) with high 
probability, as n ^ 00 . 


Note that unlike in the erasure scenario. Theorem 4 does 
not allow us to achieve capacity over the BSC{p). Eor the 
capacity of the BSC{p) is 1 — H{p), and 

1 - H{p) > 1 - 2^p{l-p) (22) 

with equality holding only for p G {0, 1}. 

This statement, unlike the ones for erasure correction and 
sparse recovery, was stated only for the COR code, and not for 


^Note that for noise stability or to obtain a convex relaxation of the decoder, 
one needs the columns to have in addition singular values close to 1, i.e., the 
restricted isometiy property (RIP). 




general high-girth codes. Our proof of this statement, which 
can be found in the Appendix, relies on the actual construction 
of COR codes and on the upper-bound on the successive 
probability of error in terms of the COR known from polar 
codes. One may also attempt to obtain this result solely from 
the high-girth property, but this requires further dependencies 
on the high-girth rate of convergence (see Section Vll). It is 
an interesting problem to obtain achievable rates that solely 
depend on the probabilistic girth for the BSC. 

V. Some proofs 

Proof of Lemma 1: We induct on logn. The base case is 
log n = 1, where the calculation is straightforward. The rank 
of 0 ^ 2 '^ [s] will be 0 if no columns are chosen, and will be 1 


if at least 1 column is chosen. Therefore, 

/o(2, l,s) = E(rankF(G^^^[s])) (23) 

= 0 • (1 - s)2 -h 1 • 2s(l - s) -f 1 • (24) 

= 2s - = £(s) (25) 

Similarly, E(rankF(G 2 ^^[s])) = 2s, and thus 

p{2, 2, s) = (2s) - (2s - s^) = s^ = r{x) (26) 


Note that all these calculations do not actually depend on E. 

For the inductive step, assume that p{n/2,i,s) is the leaf 
value of the branching process for all 1 < i < n/2. To prove 
the same for p{n,i,s), write 

G„ = G„/2 ® G 2 . 

In other words, we think of G„ as being an (n/2) x (n/2) 
matrix whose entries are 2 x 2 matrices. 

We begin with the case when i is odd. By the inductive 
hypothesis, we wish to prove that 

p{n,i,s) = (27) 

We partition the columns of Gi*^[s] into two sets: O, which 
consists of those columns which have an odd index in G„, and 
E, which consists of those with an even index in Gn- Since 
Gn = Gn/2 ® G2, we see that for i odd, the jth row of Gn 
is the ((i + l)/2)th row of G„/ 2 , except that each entry is 
repeated twice. From this, and from inclusion-exclusion, we 
see that 


p{n,i,s) = P(row i of G„ is independent 
of the previous rows) 

= P(row i of Gn[0] is independent 
of the previous rows of Gn[0]) 
-f P(row i of Gn[E] is independent 
of the previous rows of Gn[E]) 
— P(both of the above) 

= 2p 


/ n 

i + 1 \ 

/ n 

i + l \ 

2 ’ 

2 

-p[r 

2 >0 


= l\p 


n i + 1 


(28) 


(29) 

(30) 

(31) 


Next, we consider the case when i is even. In this case, we 
wish to prove that 

p{n,i,s) =r (32) 

We proceed analogously. From the equation Gn = G „/2 <8) G 2 , 
we see that the ith row of Gn is the (i/2)th row of Gn/ 2 , 
except with a 0 intersprersed between every two entries. Thus, 
the ith row will be dependent if either it restriced to O or 
it restricted to E will be dependent; in other words, it’ll be 
independent if and only if both the restriction to O and the 
restriction to E are independent. Therefore, 


p{n,i,s) = P(the restriction to O and 

the restriction to E are independent) 

= P(the restriction to O is independent)- 
P(the restriction to E is independent) 



(33) 

(34) 

(35) 



(36) 


Note that in all our calculations, we used probability arguments 
that are valid over any field. Broadly speaking, this works 
because the above arguments show that the only sorts of linear 
dependence that can be found in G^*^ [s] involves coefficients 
in {—1,0,1}. Since these elements are found in any field, we 
have that this theorem is true for all fields E. ■ 

Proof of Lemma 2: Note that a decoder over the MFC 
needs to correct the erasures, but there is no bias towards which 
symbol can have been erased. Hence, a decoder on the MFC 
is wrong with probability at least half if there are multiple 
codewords that match the corrupted word. In other words, the 
probability of error is given by"^ 


Pe{Cn) = €Cn,xf^ V, x[E‘^] = vlE'^]) (37) 

= Pe{^z GGn,z^ 0, z[E^] = 0} (38) 

where E is the erasure pattern of the MEG{p), i.e., a 
random subset of [n] obtained by picking each element with 
probability p. Fet Hn be the parity-check matrix of Gn, i.e., 
Gn = ker(i7„). Note that E has the property that there exists 
a codeword z £ Gn such that = 0 if and only if the 

columns indexed by E in i7„ are linearly dependent. Indeed, 
assume first that there exists such a codeword z, where the 
support of z is contained in E. Since z is in the kernel of Hn, 
the columns of indexed by the support of z must add up 
to 0, hence any set of columns that contains the support of z 
must be linearly dependent. Conversely, if the columns of Hn 
indexed by E are linearly dependent, then there exists a subset 
of these columns and a collection of coefficients in F such 
that this linear combination is 0, which defines the support of 
a codeword z. Hence, 


Pe{Cn) = PE{Hn[E] has lin. dependent columns}. (39) 


^If ties are broken at random, an additional factor of 1 — 1/|F| should 
appear on the right hand side. 






Recalling that the code rate is given by 1 — r, where r is the 
relative rank of the parity-check matrix, the conclusions follow. 
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VI. Appendix 


A. More Proofs 

Proof of Theorem 2: For i € H, let Bi be the event 
that the ith row of i?[s] is linearly dependent on the previous 
rows. Note that if i?[s] has full rank, then no Bi is satisfied, 
while if i?[s] has non-full rank, then there must be some linear 
dependence in the rows, so at least one Bi will be satisfied. In 
other words, the event whose probability we want to calculate 
is simply the event PlzGir 

Note that in our notation, the ith row of R is also the ith row 
of Gn- Therefore, for any S C [n], the ith row of is the ith 
row of Gn[5']. This means that any linear dependence between 
the ith row of A [S'] and the previous rows automatically induces 
a linear dependence between the ith row of G„[S] and the 
previous i — 1 rows, since the previous rows in G„ [S] are a 
superset of the previous rows in R[S]. Since this is true for 


any set SC [n], we see that 


P{Bi) = P(the ith row of i?[s] is dependent on 
the previous rows of i?[s]) 

< P(the ith row of Gn[s] is dependent on 
the previous rows of G„[s]) 

= 1 - p(n,i,s) 


(40) 

(41) 

(42) 

(43) 


Therefore, 

pfriB.')=1- 

\ieH / 

> 1 - 

> 1 - 

= 1 - 
= 1 - 
—>■ 1 a 


p u ^0 

(44) 

E 

i^H 

m 

(45) 


(46) 


(47) 

0 ( 2 - ) 

(48) 

s n —>■ 00 

(49) 


Proof of Theorem 4: We recall from [2] that in any code 
generated by taking some of the rows of G„, we have that the 
probability of error on the BSC{p) is upper-bounded as 

Pe<Y.^n ( 50 ) 

ieH‘= 


where H denotes the set of rows of G„ that we keep and 
denotes the Bhattacharyya parameter of the ith row. Proposition 
5 in [2] tells us that 


. f=(0' 


when i is even 
when i is odd 


(51) 


We recognize these functions as r and i. Thus, we see that 
the branching process of Lemma 1, when initialized at s = 
Z{BSC{p)), provides an upper bound for the Bhattacharyya 
parameters. Now, recall that the row selection criterion for COR 
matrices only keeps the rows with high p values, and thus high 
Z values. Thus, we see that (50) ensures that COR codes with 
parameter s = 2^p{l — p) can successfully transmit over the 
BSC{p). ■ 


VII. Errors from high-girth matrices 

It is an interesting to study what rate on the high-girth 
property allows to achieve positive rates on the BSC. A classic 
nnion-bound requires at least exponential rate, as explained 
next. This underlines that COR matrices achieve rates higher 
than what arbitrary high-girth matrices may reach. 

Let H — Hn be a matrix with probabilistic girth p. Consider 
G to be the code whose parity check matrix is H. The 
probability of error of this code on the BSC(p) is the probability 
that an error vector Z has in its coset (i.e., the other error 



vectors that lead to the same syndrome) a more likely vector, 
i.e., 

Pe = Pz{3z' €f^:HZ = Hz',w{z') < w{Z)} (52) 

where Z is i.i.d. Bemoulli(p). This is equivalent to 

Pe = Pzi^x €f 2 -Hx = 0, w{x + Z)< w{Z)}. (53) 

Note that w{x + Z) < w{Z) means that Z takes more often 
the value 1 in the support of x, which has probability 

E (54) 

k=w/2 ^ ^ 

where rt;/2 is rounded up if not even. Note that 

E (55) 

k=w/2 ^ ^ 

< - p)“/ 22“ = z(p)’", (56) 

where 

z(p) = 2pi/2(l-p)i/2 (57) 

is the Bhattacharyya parameter of the channel. Hence, 

Pe < E N{w)z{p)'^ (58) 

W>1 

where N{w) is the number of codewords of weight w, i.e., 
N{w) = \{x : Hx = 0,w(x) = w}\. (59) 

Note that 

N{w) < \{x : H[x] has lin. dep. col.,u;(a;) = r(;}|. (60) 

P5'{iT[5'] has lin. dep. col.}, (61) 

where S is uniformly drawn with a support of size w. By 
standard concentration arguments, this is upper-bounded by 
P{iT[s] has lin. dep. col.} where s = w/n + o{yJw/n), and 
by assumption, 

P{i/[s] has lin. dep. col.} = T(n) (62) 

when s < p. Thus, 

< E i°g(^(p))) + E N{tn)z{pf'^ (63) 

t<fl t>fi 

where t takes values such that tn is an integer. Hence T{n) 
must be exponentially small to drive the first term to 0. 




